POINTWISE CONVERGENCE FOR CUBIC AND POLYNOMIAL 
MULTIPLE ERGODIC AVERAGES OF NON-COMMUTING 

TRANSFORMATIONS 
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Abstract. We study the limiting behavior of multiple ergodic averages involving several not 
necessarily commuting measure preserving transformations. We work on two types of averages, 
one that uses iterates along combinatorial parallelepipeds, and another that uses iterates along 
shifted polynomials. We prove pointwise convergence in both cases, thus answering a question 
of I. Assani in the former case, and extending results of B. Host-B. Kra and A. Leibman in 
the latter case. Our argument is based on some elementary uniformity estimates of general 
bounded sequences, decomposition results in ergodic theory, and equidistribution results on 
nilmanifolds. 



1. Main results 

In this paper we study the hmiting behavior, in the mean and pointwise, of multiple ergodic 
averages involving measure preserving transformations that do not necessarily commute. We 
focus our attention on two such types, special cases of which have previously attracted some at- 
tention. One involves iterates taken along combinatorial parallelepipeds, and the other involves 
iterates taken along shifted polynomials. 

1.1. Cubic Averages. For /c G N we set 

Ffc:={0,l}^' and Vk*:=Vk\{0} 

where := (0, 0, • • • ,0). Let {X, X, /i) be a probability spac^ll, and for e E let T,,: X —?■ X 
be measure preserving transformations and /e € L°°{fi) be functions. We are going to study the 
limiting behavior of certain multiple ergodic averages taken along A;-dimensional combinatorial 
parallelepipeds of iterates of the transformations T^. More precisely, the cubic averages of 
dimension k are given by 

(1) A,MTeJe){x) := ^ Yl n/^(^""x) 

ne[l,N]'' eeVj* 

where for e = (ei, . . . , e^) G Vk and n = (ni, . . . , rifc) € N'^ we define 

e • n := ei^i H h e^nfe. 

For instance, the cubic averages of dimension 1 are the ergodic averages, the cubic averages of 
dimension 2 are defined by 

^ h{Trx)-f2{T,"x)-h{T^+^x), 

l<m,n<N 
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and the cubic averages of dimension 3 are similarly defined, using iterates of 7 transformations 
taken along the combinatorial parallelepipeds m,n,r,m + n,m + r,n + r,m + n + r. 

The averages ^/c,Ar(rg, are closely linked to the Gowers-Host-Kra seminorms ||| • |||fc 
that have been used extensively in ergodic theory to find convenient majorants for various 
other multiple ergodic averages. In [T3] it is shown that for ergodic systems {X, X , ^,T), and 
real valued functions / € L°°(/i), we have 



where A^^NiT, f) is defined by letting T^: = T and = / in ([T]) for every e S V^*. This identity 
also holds for non-ergodic systems once the seminorms ||| • |||fc are appropriately defined. 

The study of the limiting behavior of the averages ([1]) was initiated by V. Bergelson in [6], 
where convergence in was shown in dimension 2 under the extra assumption that all the 

transformations are equal. Under the same assumption, Bergelson's result was extended by 
B. Host and B. Kra for cubic averages of dimension 3 in [T3], and for arbitrary dimension k 
in |14j . More recently in [3], I. Assani established pointwise convergence for cubic averages of 
arbitrary dimension k when all the transformations are equal. In the same article, and prior 
to this in [l] and [2], convergence for general, not necessarily commuting transformations, was 
studied for the first time. Pointwise convergence was established for 2-dimensional averages, 
and some partial results were obtained for dimensions greater than 2, including convergence 
when all the transformations are weak mixing. In this article we complete this study by proving 
pointwise convergence for the cubic averages of arbitrary dimension. 

Theorem 1.1. Let A; G N, {X, X, n) be a probability space, and for e € let T^: X X be 

measure preserving transformations, and G L^{^) be functions. Then the cubic averages of 
dimension k, given by ([1]), converge pointwise as N ^ oo. 

It is interesting to contrast the limiting behavior of the cubic averages with some other 
similar looking averages. To begin with, the averages jj^-zwj^Y^M<m,n<Nhi'^r^) ' /a (7? a:) ' 
/3(T™^"'x), and their higher dimensional relatives, do not in general converge pointwise (for 
an example when /2=/3=l see |17]). On the other hand, our argument can be easily adapted 
to prove convergence in L^(^) for such averages. As for the averages X^i<m n<Af ' 
/2(S™x) • /3(T"5"^j;), and the "diagonal averages" ^ E«=i ' f2{S''x), it is known 
that they do not converge in general, even in L'^{fi), unless one makes some commutativity 
assumption about the transformations T and S (for counterexamples, see [20] for the former, 
and [4] or [9] for the latter). In fact, even under the assumption that all transformations 
commute, pointwise convergence of these averages and their higher dimensional relatives is not 
known. 

A key concept that underlies the convergence result of Theorem 11.11 is the characteristic 
factors, meaning a collection of Tg-invariant sub-cr-algebras y^, having the property that the 
difference Ak^NiTe, fe){x) - Ak^N{TeJe){x), where /g = E(/e|3^e), converges pointwise to 0. 
Our main goal is to make a suitable choice so that the corresponding factor systems have very 
special algebraic structure. This is done by controlling the averages Ak^N{Te, fe) by certain 
seminorms (their precise definiton is given in Section [2. 2p . thus obtaining the following result: 

Theorem 1.2. Let A; G N, {X,X,fj,) be a probability space, and for e G V^* let T^: X X be 

measure preserving transformations, and fe G L°°{fi) be functions. Furthermore, suppose that 
|||/e|||fc,Te = for some e G V^*. Then the cubic averages of dimension k, given by converge 
pointwise to as N ^ oo. 
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In fact we give explicit bounds relating the pointwise limiting behavior of the cubic averages 
([T]) and the seminorms |||/e|||fe,TE (see Corollary 13. 7p . 

Using different terminology, Theorem 11.21 states that the factors ^A:-i,Te) defined in Sec- 
tion [231 S'l'e characteristic factors for pointwise convergence of the averages ([1]). 

To prove Theorem II .21 we simplify and extend to our particular context an argument given by 
Assani in [3J. To prove Theorem 1 1.1 1 we combine Theorem 11.21 with the decomposition result of 
Proposition 13.81 (which was proved in |10j using the structure theorem of [14]). We eventually 
reduce matters to a known convergence property of nilsequences (all notions are defined in 
Section [2|). 

1.2. Polynomial averages. We are going to generalize some convergence results of B. Host 
and B. Kra ^15j and A. Leibman j22j that involve multiple ergodic averages of a single trans- 
formation to the case that involves several not necessarily commuting transformations. 

Theorem 1.3. Let £ € N, and {X, X, /i) be a probability space. For i = 1, . . . ,i letTi: X ^ X 
be measure preserving transformations, fi G L°°(;u) be functions, pi E be non-constant 
polynomials such that pi — pj is non-constant for i ^ j, and 6: N — t- N 6e a sequence such that 
b{N) — 7> oo and b{N)/N^/^ — )■ as N ^ oo, where d is the maximum degree of the polynomials 
Pi0 Then the averages 



Nb(N) 

^ ' l<m<N,l<n<b{N) 



X) 



converge pointwise as N oo. 

Using this result for i + 1 in place of i, letting Tq = ■ ■ ■ = Ti = T , po = 0, and integrating 
with respect to /i, we deduce that the averages 

1 ^ 

(3) -^/i(rpiWx)-...-/,(rf^(")x) 

n=l 

converge weakly in L?'{ii) as — >■ oo. This recovers one of the main results from Let us 
remark though that we were not able to deduce from Theorem 11.31 anything useful regarding 
the well known open problem of convergence (weakly, in the mean, or pointwise) of the averages 
J! S^=i /i(^f ^^"^2;) • . . . • f liT^^'^'^'^ x) for general commuting measure preserving transformations 

ri,...,r,. 

A key ingredient in the proof of Theorem 11.31 is the following result; it plays the same role 
Theorem 11.21 plays in the proof of Theorem ll.li 

Theorem 1.4. Under the assumptions of Theorem \1.3l there exists E N, depending only on 
i and the maximum degree of the polynomials pi, . . . ,pi, such that: If |||/i|||A:,Ti = for some 
i (z {1, . . . , i} , then the averages 

. N KN) 

(4) ^ E 1 E /i(rr"^"^-) • • • • • /K^r"^"^^ 

m=l n=l 

converge pointwise to as N ^ 00. 



The second condition guarantees that the contribution of several boundary terms is negligible. For instance, 
for every bounded sequence (a(n))„gN and polynomial p G Z[t] with degree at most d, the difference of the 
averages Ei<„<5(iv),p(„)<m<]v+p(n)a(n-) and Ei<„<b(iv),i<m<]va(n') goes to as A'' 00. 
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It follows at once that the factors Z^-i^t^, defined in Section [2l are characteristic factors for 
pointwise convergence of the averages ([2]) and (|4]). 

Using Theorem 11.41 for Ti = ■ ■ ■ = = T, and integrating with respect to /x, we deduce 
that there exists k £ N such that if |||/i|||A:,T = for some i S {!,...,£}, then the averages ([3]) 
converge to in L'^in) as iV ^ oo. This recovers one of the main results from |22] needed to 
prove convergence in L'^{fJ-) for the averages 

1.3. Open problems related to multiple recurrence. We state some multiple recurrence 
problems that are naturally related to the previously established convergence results. Histor- 
ically, recurrence problems have turned out to be easier to establish than the corresponding 
convergence problems, but this does not seem to be the case in our current setup. 

Problem 1. Let A: € N, {X,X,fi) be a probability space, and for e G 14 let T^: X ^ X be 

measure preserving transformations. Is it true that for every A £ X with ^{A) > there exists 
n e ^^such that 

We believe that the answer is positive. When all the transformations commute this is indeed 
the case. Furthermore, the answer is positive when all the transformations are weak mixing 
since in this case the corresponding averages converge to iniA)f (see [3], or use Theorem 11.21 
in the current article). In general, even the case /c = 2 is open. Namely, it is not known 
whether if T, S, R are measure preserving transformations acting on the same probability space 
{X, X, fi), and A & X satisfies fJ,{A) > 0, then there exist m, n € N such that 

(5) fi{A n T-'^A n S-'^A n > o. 

This problem was first studied by Assani in [2j. We remark that using Theorem 11.21 one can 
reduce matters to verifying this multiple recurrence property for very special systems (namely, 
systems with ergodic components rotations on compact abelian groups), but we were not able 
to handle this seemingly simple case. The non-ergodicity of the transformations causes serious 
problems and another obstacle (that becomes more serious in dimension higher than 2) is that 
it is not clear why various approximations arguments that one would like to use preserve the 
recurrence property ([5]). Interestingly, we were able to overcome the analogous problems for 
questions pertaining to convergence. Let us also remark that in general no power of f^{A) can 
be used as a lower bound for the multiple intersections in (l5|). To see this let S = T~^, R = T^ 
and factor out the transformation T~^"; then the left hand side in ([5|) becomes greater than 
/u(Anr~(™'+^")AnT~^('"+^")A), and it is known that in general no power of ^{A) can be used 
as a lower bound for these expressions (see Theorem 2.1 in [7]). 

Problem 2. Let £ G N, (X, A',/i) be a probability space, and Ti,. . . ,T£ be measure preserving 
transformations acting on X. Furthermore, let pi, . . . ,p£ be distinct polynomials with integer 
coefficients that satisfy Pi{0) = for i = 1, . . . ,i. Is it true that for every A G X with n{A) > 
there exist m, n G N such that 

fi{A n r~"-^^(")A n • • • n r-'"-^^^"^) > o ? 

Again, we believe that the answer is positive. Notice that the case where Ti = ■ ■ ■ = 
Ti corresponds to the so called "polynomial Szemeredi Theorem" proved by Bergelson and 
Leibman |8j. When all transformations are weak mixing the answer is positive since in this 
case the corresponding averages converge to {iJ,{A)Y'^^ (this follows from Theorem ll.4p . In 
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general, even the case where all the polynomials are linear is open. Lastly, let us note that the 
assumption that the polynomials are distinct is necessary. It is known (see for example [9J), 
that there exist (non-commuting) transformations T, S, acting on the same probability space 
(X, X, fi), and a set A e X with n{A) > and such that /u(T'^yl n S"-A) = for every n G N. 

1.4. General conventions and notation. The following notation will be used throughout 
the article: N := {1, 2, . . .}, Tf := f oT, ^(z) is the real part of a complex number z. We write 
a: Z]\f — )• C when a: N ^ C is a periodic sequence with period A^. We use boldface symbols 
for vectors. If F is a finite set and a: F — > C, then E„gi7a(n) := SneF'^(^)- ^'-'^ r G N, 
we denote by Sra the sequence defined by {Sra){n) := a{n + r). We use the symbol <C when 
some expression is majorized by a constant multiple of some other expression. If this constant 
depends on some variables ki, . . . , ki we write <^ki,...,ke- 



We gather some basic background material that we use throughout this article. 
2.1. Basic facts from ergodic theory. 

Systems. A system is a quadruple {X, X , fj., T) where {X, X, fi) is a Lebesgue probability space 
and T: X —t- X is an invertible measure preserving transformation. 

Factors. For the context of this article, a factor of a system {X, X, fi, T), is a system {X, y, fi, T) 
where 3^ is a T-invariant sub-u-algebra of X. We often abuse terminology and refer to y in 
place of the quadruple {X,y , pi,T). 

Isomorphic systems. Two systems {X, X , ^,T) and {Y,y S) are isomorphic if there exists 
a bijective measurable map vr: X' — ?> Y' , where X' is a T-invariant subset of X and Y' is an 
S'-invariant subset of y, both of full measure, such that //ovr"^ = u and (S'o7r)(a;) = (7roT)(x) 
for every x ^ X' . 

Ergodicity and ergodic decomposition. We define X := {A G X : ^{T^^ Al\A) = 0}. A system is 
ergodic if Z consists only of sets with measure or 1. Given an ergodic system and / G L^(/u), 
the ergodic theorem states that for almost every x G X we have 



Let X 1-^ /Xa; be a regular version of the conditional measures with respect to the a-algebra X. 
This means that the map x /i^, is Z-measurable, and for every bounded measurable function 
/ we have 



The measures have the additional property that for /i almost every x G AT the system 
{X,X , ^x-,T) is ergodic. 



2. Background Material 





f dfj,x for fi almost every x G A. 



Then the ergodic decomposition of fi is 



(6) 
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2.2. The seminorms ||| • 1^. The seminorms [|| • fk were defined for ergodic systems in 
These definitions can be easily extended to non-ergodic systems. 

Given a system {X, X, fi, T) with ergodic decomposition as in ([6]) and a function / S L°°{^), 
we define inductively 



(7) 
(8) 



N 



2'=^:= hm ^,y\\f-T 



n J. Ill 2*= 



life • 



n=l 



It can be shown that for every k £ N the limit above exists, and ||| • 1^, thus defined, is a 
seminorm on L^{n) (see [Mj, [10]). If further clarification is needed, we write ||| • or 

III ■ IIU,T- 

More explicitly, when k >2, one has 



(9) 



^lim^E„,,[i,^p..Jh^^E„,_^,[i,^.] /| / n 



where n = (ni, . . . , nfc_i). It follows that if ||j/|||Loc(^) < 1, then |||/|||fc < ||/||ii(^) for every 

ken. 

For every function / € L°^{fj,) we have 



dfi{x) 



It follows that if 



0, then l/lfc,^^ for /i almost every x ^ X. 



2.3. Nilsystems and nilsequences. A nilmanifold is a homogeneous space X = G/T where 
G is a nilpotent Lie group, and F is a discrete cocompact subgroup of G. If Gk+i = {e} , where 
Gk denotes the A:-the commutator subgroup of G, we say that X is a k-step nilmanifold. 

A /c-step nilpotent Lie group G acts on G/T by left translation, where the translation by a 
fixed element a E G is given by Ta{gT) = {ag)T. By mx we denote the unique Borel probability 
measure on X that is invariant under the action of G by left translations (called the normalized 
Haar measure), and by Q/T we denote the completion of the Borel u-algebra of G/T. Fixing 
an element a E G, we call the system {G/T,G/T,mx,Ta) a k-step nilsystem. 

If X = G/T is a /c-step nilmanifold, a G, x X , and / E G{X), we call the sequence 
(/(a"x))„gN a basic k-step nilsequence. A k-step nilsequence, is a uniform limit of basic k-step 
nilsequences. 

We are going to use the following result of A. Leibman (see Theorem A in [21] ) : 

Theorem 2.1 ([21j). Let X = G/T be a nilmanifold, ai,...,a£ E G, fi...,fe E G{X), 
and pi, . . . ,pi: —> Z, be polynomials. Then for every F0lner sequence ($7v)AfGN in ^'^ o-nd 
xi, . . . , E X the averages 

■nr— r 2^ Ji(«i • • • /^(a^ xi) 

' ' nG*jv 

converge as N ^ oo. 
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2.4. The factors and their structure. Given a system {X,X , ^,T), it was shown in [14J 
(for ergodic systems but the same construction works for general systems) that for every /c > 1, 
there exists a T-invariant sub-u-algebra Zj._i of X that satisfies 

(10) for f G L°°(/i), E(/|Zfc„i) = if and only if |||/||U,t = 0. 

The connection between the factors of a given system and nilsystems is given by the following 
structure theorem of Host and Kra: 

Theorem 2.2 ([14J). Let € N and {X, X , fj,,T) be a system with ergodic decomposition as 
in ([6]). Then for fi almost every x & X the system {X, Zj., fJ-x,T) is an inverse limit of k- step 
nilsystems. 

The conclusion in the preceding statement means that for /i almost every x G X for a 
given measure there exists an increasing sequence of T-invariant sub-cr-algebras (Afj)jgN 
(depending on Hx)-, such that \l j^-^Xj = Af up to sets of //^-measure zero, and each system 
{X, Xj, ^x,T) is isomorphic to a fc-step nilsystem. 

We remark that although we do not make explicit use of Theorem 12.21 in this article, it is a 
key ingredient in the proof of Proposition 13.81 that is crucial for our analysis. 

3. Characteristic factors and convergence for cubic averages 

3.1. Characteristic factors for cubic averages. We are going to prove Theorem 11.21 The 
main idea is best illustrated by considering the case of cubic averages of dimension 2. Assuming 
for example that /i e L°°{iJ,) satisfies |||/i|||2,^i,Ti = 0, and /2, /s G L°°(ii), our goal is to establish 
the pointwise identity 

lim \E^.ne[i,N]fi{T^x) ■ f2{T^x) . /3(r3-+"x)| = 0. 

It suffices to show that for fj, almost every x (z X we have 

(11) lim E„e[i,^]|lE™e[i,^]/i(Tr:r) ■ f3{T^+''x)\^ = 0. 

Using suitable applications of a variation of van der Corput's fundamental lemma (the precise 
statement is given in Lemma 13. 3p we can show (see Proposition 13. 6p that the limit in pip is 
bounded by a constant multiple of 

(12) lim E„e[i,jv]l Hm Eme[i,N]MTrx) ■ fi{T^^''x)\\ 

The ergodic theorem implies that for fi almost every x G X the last limit is equal to 



lim E„g[i ;v] / f lix) ■ fiiTl'x) d fix, n 



2 4 



'l[ll2,/i^,Ti 



where IJ- = J fJ'x,Ti dfi{x) is the ergodic decomposition of the measure with respect to Ti. Since 
|||/i|||2,/i,Ti = implies that |||/i|||2,^^,ri = for almost every x G X, our goal is established. 

Since most of the calculations and estimates do not depend on the dynamical structure of 
the sequences (/e(r^x))„gN (it is only at the very last step of the argument that we use the 
pointwise ergodic theorem to take advantage of this extra structure) we work them out for 
general bounded sequences (ae(n))nGN- 

Key to our study will be some quantities that control the limiting behavior of the cubic 
averages ([I]) when the sequences (/g(T"x))„gN are replaced by general bounded sequences 
{ae{n))n£n- Closely related quantities have been defined by T. Cowers in pjj and by B. Host 
and B. Kra in [TB]. We define these and prove some basic estimates in the next subsections. 
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3.1.1. Measures of uniformity. We remind the reader that when we write b: Zjy ^ C we refer to 
a periodic sequence 6: N — >■ C with period A'". We say that a = {aN)NeN, where aj\f : TL^ C, is 
uniformly bounded, if there exists a constant C € M such that |aAr(n)| < C for every n € [1, A^] 
and N gN. For /c G N, z G C, and e G V^, we let |e| := ei + ■ ■ • + e^, and C^z := z li k is even, 
and C^z := z \i k is odd. 



We let 



ll«lllc/i(N) := limsup \&ne[i,N]aN{n) 



and for k > 2 we define 
(13) |||a|||c7j^(N) : = 

( limsupE„^g[i^7v] • • • limsupE„^_^g[i_^] limsup |lE^e[i,Ar] TT &^aN{rn + e • n)n 

^ N^oo N-^oo N^oo -- ' 



1 

2\ 2fc 



1 



where n = (ni, . . . , n\,_x). 

Furthermore, for G N we let 

lll«Af|||l/i(Zjv) '■= |lEn6[l,Ar]aAr(n)|, 

and for k >2 we define 

ll|aAf|||!7;,(Zjv) := (En6[i,Ar]fc-i|]Emg[i,Ar] C'^'laArlm + e • n)|^)^. 

This is the so called Gowers norm of a^v- 

Given a bounded sequence a: N — )■ C , for G N, we define qn'- I-n — )• C by aN{n + NIj) := 
a{n) for n G [1, A^]- We let a := (aAr)jvgN- Furthermore, we define 

ll|a|||c/fc(N) := l!|a|||i/fc(N), lll«lll^7fc(z^r) := \laN\luk{i.N)- 

Notice that |l|a|||(/^(i^) can also be computed by replacing cat with a in ([T3|). 
One immediately sees that ||| • |||c/j,(n) satisfies the recursive identity 

(14) limsup E^g[i^jv]lll'5ra • a|||J(N) = III « III ^^^^(n)- 

Af— >-oo 

We caution the reader that the triangle inequality does not necessarily hold for ||| • ||| jj^ (pj) , but 
this is not going to play any role in this article. 

The next result links the seminorms ||| • |||c/j.(n) with the ergodic seminorms ||| ■ |||fc that were 
defined in Section [2.21 (a similar result was also established in Corollary 3.11 of |16j). 

Proposition 3.1. Let {X, X , iJ,,T) be a measure preserving system with ergodic decomposition 
M = / d^{x) and f G L°°{fi). Then for fj, almost every x & X we have 

lll/(r"x)||b,(N) = \lf\h,,.. 

Proof. The ergodic theorem gives that for /x almost every x & X, for every n G N^^^ and 
e G Vk-i we have 



The result now follows by using the definition of ||| • |||c/j.(n) ^iid formula ([9]). □ 
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3.1.2. Comparing ||| • |||;7^(2^) with lim supjv-s.oo III ' llli7fc(Zjv)- The following estimate will be key 
for our analysis: 

Proposition 3.2. Let a = (aAr)jvgN> where oat: Zat — )■ C, be uniformly bounded by 1. Then 
for every k £N we have 

lim sup III a III (2^) <^k III a III ^/^(N) • 
Af-s>oo 

To prove Proposition 13.21 we are going to use the following variation of van der Corput's 
fundamental lemma: 

Lemma 3.3. Let G N and a: — C. Then for every i? G N uie have 

IEne[i,Af]l«NP 



|lE„e[i,Ar]a(n)p < 2 -E^gji^^jl^l - — j3?(E„g[i_Ar]«('^ + ?') • «H) + ^ 
Proof. Let i? G N. Using the identity 

^neli,N]a{n) = E„g[ijv]Ere[i,/?]a(^ + ^) 
and the Cauchy-Schwarz inequality, we get that |E„gji a']'2(^)P is bounded by 

^neli,N]\^reli,R]a{n + r)\'^ = Ery(,[i^R]Ene[i,N]a{n + r) ■ a{n + r'). 

Isolating those terms for which r = r' , and using the symmetry up to conjugation of the 
remaining expression with respect to r and r', we see that the last expression is equal to 



2 

l<r'<r<R 

To end the proof, it suffices to perform the change of variables n ^ n — r' and notice that for 
k G {1, . . . , R} the equation r — r' = k with 1 < r' < r < R has R — k solutions. □ 

Lemma 3.4. Let G N and a: Zjv C be bounded by 1. Then for every R (^N we have 

^ne[i,N]\^rne[i,N]a{m + n) ■ a{m)\'^ < 2 • E^gji^^j |E„g[i^Af]«("i + ■ a{m)\'^ + 1/ R. 
Proof. Using Lemma 13.31 we deduce that the left hand side is bounded by 

2 • Ene[i,N]K'e[i,R]{'^ ~ ■^)^{^nie[i,N]a{m + n + r) ■ a{m + r) ■ a{m + n) ■ a{m)) + l/R. 

Interchanging the averages and performing the change of variables n — t- n — m we deduce that 
the last expression is equal to 

2 ■ Ke[i,R] (l - \^me[i,N]a{'m + r) • a(m)p + l/R. 

The result follows. □ 

Next we prove Proposition 13.21 by successively applying Lemma 13.41 

Proof of Proposition [XM Remember that a^: TLjq — > C is defined by ajqin + A^Z) := a(n) for 
n G [1, -/V]- For /c = 1 we have 

limsup|||a|||[/^(2j^) = limsup|||oAr|||f;j(2j^) = |||a|||[/^(pj). 

Suppose that the statement holds for /c G N, we are going to show that it holds for k + 1. 
We have 

(15) lll"A^llll/l+'i(Ziv) ='^n^,...,n^:€[l,N]\^me[l,N] C''''aAr(m + eiUi H hefcnfc)|^ 

eeVk 
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We fix ni, . . . ,nfc_i G and apply Lemma [33] for ^Ar,ni,...,nfe_i '■ '^N C defined by 

AN,ni,...,nt:-i ("l) = Y[ ^''''oAf ("T- + ^l"! ^ efc_infc_i). 

We deduce tfiat for every i2, G N, the right hand side of ([TC]) is bounded by 2 times 
IEni,...,nfc_ie[i,Af]IE„fee[i,i?]|lEme[i,Af] n C''''aAr(m + eini H hefc^fe)!^ + l/R- 

Next, we fix G N, and use the inductive hypothesis for the sequence Sn^a ■ a. We get 

2 

hmsupE„^_..._„^_^g[i^Ar] E^g[i.Ar] W &^aN{m + eini^ h e^nfe) 

hmsup|||5„^a • o|||?/,(z^) <fc WSn^a ■ a|||?/,(N)- 

A''— >-oo 

Combining the previous estimates we get for every positive integer R that 

limsup|||a|||2,^^'^(2^) = limsup|||ajv|||^^t_' (z^) <fc Kike[i,R]\lSn^a ■ a|||^'^(N) + V^- 

A"— >oo A— >oo 

Finally, taking the limsup as i? — >• oo, and using the identity (I14p we get the advertised 
estimate. □ 

3.1.3. Proof of Theorem We first recall a known estimate (see Lemma 3.8 in [llj). 

Lemma 3.5 (Gowers-Cauchy-Schwarz Inequality). Let k >2 be an integer, G N, and 
for e G Vfc_i let '■ "^n — ^ C. T/ien 

llc/fc(Ziv)- 



lEnG^i.Af-illEmeli.A] c'"' (m + € • n) | ^ < |||ae|f^ 



Combining Lemma [33] and Proposition 13.21 we are going to prove the following key estimate: 
Proposition 3.6. Let k >2 be an integer and for e G V^-i let a^- N — > C be sequences. Then 
(16) limsupEng[i_jv]fe-i|lEmg[i,Af] Y\. C''''ae(m + e • n)|^ <fc |||ae"'^ 



Proof. We fix A; > 2, G N, and for e G Vfc_i we define Oe at : Zjv — ?• C as follows: 



A— >oo 



I Oe [nj for e 7^ 

where = (0, 0, . . . , 0) and n G {1, . . . , N}. Suppose that the element n G has all its 

coordinates in the interval [1, [A^/Zc]]. Then 

\0 for me([iVA],JV]. 

It follows that 
is at most 

^ ■ IEne[i,[Ar/fc]]fc-i|lE.me[i,A] H Cl''loe,Ar(m + e • n) | , 
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which in turn is at most 

k'' ■ ^ne[l,N]k-^\^me[l,N] JJ ^^^^ "■e,N {m + e ■ n)\'^ . 

Using Lemma 13.51 we see that the last expression is bounded by a constant multiple of 

Combining the above, taking limits as N ^ oo, and using Proposition 13.21 we deduce that the 
left hand side of is bounded by a constant multiple of 

n i"'=ic^fe(N) 

where = {ae,N)N€N- Furthermore, an easy computation shows that 

III- III J ^"^ ■ lll«o|||(7fc(N), for e = 

(iaei[/;,(N), for 
This completes the proof. □ 

Applying the previous estimate for suitably chosen sequences we get the following: 

Corollary 3.7. Let k > 2 be an integer, {X,X,fi) be a probability space, and for e E Vk-i let 
T^: X X be measure preserving transformations, and fe € L°°{fj,) be functions. Furthermore, 
let (J, = J ^x,Te dn{x) be the ergodic decomposition of the measure /i with respect to Tg. Then 
for fj, almost every x (z X we have 

limsupE„^[i,^]._i|E^g[i,^] H /e(rr+-"x)p«fc J] 



Proof. Let x £ X. Applying Proposition 13.61 for the sequences ae(n) = /e(T"x), e G V^-i, we 
get that the left hand side is bounded by a constant multiple of 



WukiN)- 



Proposition 13.11 gives that for every e G Vk-i, for /i almost every x G X, we have 

\lfe{T^x)\lu,in) = \lfeh,,^,r^. 

This completes the proof. □ 

We are now one small step from proving Theorem 11.21 

Proof of Theorem M.SX Suppose that = 0, where 1 = (1,0, ••• ,0). The proof is 

similar in the other cases. We want to show that for almost every x £ X 

hm E„g[i,^]. J] /e(Tf"x) = 0. 



Using the Cauchy-Schwarz inequality and bounding all functions /(o,e)) where e G Vfe_i, by 
their sup norm, we deduce that the expression 

|2 
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is bounded by a constant multiple of an average of the form 
(17) E„g[i,^].-,|E„,e[i,^] n 

where /i = /i, fe G {/(i,e),e G V^_^}, and G {T(i,e),e E V^_^} for e e 

Since |||/i|||fc,/x,Ti = implies that |||/i j, = for almost every x & X, Corollarv 13.71 
gives that for /i almost every x G X the averages (fT7|) converge to 0. This completes the 
proof. □ 

3.2. Convergence of cubic averages. In this section we are going to prove Theorem II. li 
A natural approach for establishing such a convergence result would be to try to combine 
Theorem 11.21 with Theorem 12. 2| in order to reduce matters to the case where all systems are 
nilsystems. Such an approach works well when all the transformations are equal, but in our 
more general setup it presents problems that are difficult to circumvent. For instance, although 
it is possible to reduce matters to the case where for every e G the ergodic components 
of the transformation Tg are inverse limits of nilsystems, the various ergodic disintegrations 
and sub-cT-algebras involved in the inverse limits cannot be taken to be the same for each 
transformations Tg (even if the transformations commute) . To overcome this problem we work 
pointwise, and use an approach similar to the one in [lO]. We combine Theorem 11.11 with a 
pointwise decomposition result that applies to general (not necessarily ergodic) systems. It is 
a direct consequence of Proposition 3.1 from [lOj which in turn is a non-trivial consequence of 
the structure theorem of Host and Kra stated in Theorem 12. 2[ 

Proposition 3.8. Let {X, X, /i, T) be a system, f € L°°{fj,), and A; S N. Then for every e > 0, 
there exist measurable functions f^jf^jf^, with L°°{^) norm at most 2 ||/||/^oo(^), such that 

(i) f = r + r + r; 

(ii) |||r|||fc+i = 0; ||ri|^.(^)<e; and 

(iii) for ^ almost every x G X, the sequence {f^{T"'x))neN is a k-step nilsequence. 

Arithmetic versions of this result were recently established in [12] and in [23]. The reader is 
advised to think of the function f^ as an error term; when one works with convergence problems 
it typically can be shown to have a negligible effect on our averages (but this is not the case 
for recurrence problems unless one aims at a uniform lower bound). The function /" is the 
uniform component and it too can be neglected once the appropriate uniformity estimates are 
obtained. Finally, the function f^ is the structured component; this has to be further analyzed, 
typically using equidistribution results on nilmanifolds. 

Proof of Theorem \l.l[ For k = 1 the result follows from the pointwise ergodic theorem. So we 
can assume that k >2. Furthermore, we can assume that ||/e|lioo(^) < 1 for every e G V^. Let 

AM){x) :=E„g[i,^]. H f,{Trx). 

We are going to show that for fi almost every x £ X the sequence {A]\f{f(:){x))]\f^^ is Cauchy. 

By Proposition 13.81 we have that for every m S N and e G V^*, there exist measurable 
functions f^^m fem-> ferric with {fJ.) uorm bounded by 2, and such that 
(\) f = -I- f" -\- ■ 

(ii) lll/eVllU,T. =0; ||/,%||^,(^) < 1/m; and 

(iii) for ji almost every x G X, the sequence {f^,^{T^x))neN is a (/c — l)-step nilsequence. 
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First we study the contribution of the functions f^^- Combining property ^ with The- 
orem [L2l we see that when evaluating the hmit of the averages Ai\i{fe), we can ignore the 
contribution of these functions, namely, for every m £ N, for fi almost every x & X we have 

(18) hm \AN{fe){x) - A^Ulm + flM\ = 0. 

Next, we study the contribution of the functions /l^- We are going to show that this too is 
essentially negligible, as long as we consider suitably large values of m. Indeed, if we expand 
the expression A^^f^^ + /l.^) — Aiy{f^,^), use Corollarv 13.71 to bound each of the terms, and 
also use that l/l^mifc.Mi, t,, — ^ \lfe,m\lk,^ix t,, < 2, we get for ^ almost every x € X the 
bound 

limsup \AN{flm + flm){x) - AN{flm)ix)\ <fc max|||/| <C max ||/e%|Li. . • 

By property (ii) we have limm-s>oo / l/lml dfi = ioi e G V^., and as a consequence there exists 
a sequence {mi)i^^, with mi — > oo, and such that for fj, almost every x & X we have 



for every e G V^*. From the preceding discussion it follows that for fi almost every x € X 
(19) hm limsup |^jv(/eV, + flmM^) - AM{flmM^)\ = 0. 

(-5>oo Ar_>oo 



Combining (|T8|) and p9|) we get for fj, almost every x £ X that 
(20) lim limsup |A^(/e)(x) - .4jv(/,%)(x)| = 0. 

Since by property (iii), for fi almost every x £ X, for every I G N, and e £ V^, the sequence 
(/e.m; (^e'^))neN is a nilscquencc, it follows from Theorem 12.11 that for /i almost every x G X, 
for every I £ N, the averages ^Ar(/|,m;)(^) converge. Combining this with (f20l) . we deduce 
that for almost every x € X the sequence (^Ar(/e)(x))ArgN is Cauchy. This completes the 
proof. □ 

4. Characteristic factors and convergence for polynomial averages 

In this section we are going to prove Theorems 11.31 and II. 4i As was the case with the 
cubic averages, some uniformity estimates for general bounded sequences play a key role in the 
argument. We start with establishing these. 

4.1. Uniformity estimates. We remind the reader that in the forthcoming statements 6 : N — )• 
N is a sequence that satisfies 

b{N) ^oo and b{N)/N^/'^^Q 

where d is the maximum degree of the polynomials involved in each statement. To avoid 
confusion, let us also remark that none of the sequences defined defined in this section is 
assumed to be periodic. 

Our goal in this section is to establish the following estimate: 

Proposition 4.1. Let ai, . . . ,a^: N — t- C be bounded sequences and pi, . . . ,p£ G 7j[t] be non- 
constant polynomials such that pi — pj is non-constant for i ^ j. Then there exists k G 
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N, depending only on £ and the maximum degree of the polynomials pi,...,pi, such that if 
ll|flj|||!7fc(N) = for some i G {1, . . . ,£}, then 



lim E, 



me[l,N] 



0. 



i=l 



It will be more convenient for us to prove a somewhat more involved statement, where the 
uniform sequence is associated with the polynomial of maximal degree: 

Proposition 4.2. Let oi : N — C 6e a bounded sequence and a2,Ar, . . . , ai^N : N — )• C, € N, be 

a collection of uniformly bounded sequences. Furthermore, let pi, . . . ,pe G be polynomials 
such that pi — Pi is non-constant for i = 2,...,£, and suppose that deg{pi) > deg{pi) for 
i = 2, Then there exists k £ N, depending only on I and on deg{pi), such that if 

ll|oi|||(7fc(N) = 0, then 



(21) 



lim E, 



me[l,N] 



n€[l,b{N)] 



ai{m + pi{n)) • ai,Ar(m + pi(n)) 



i=2 



0. 



Proof of Proposition \4-l\ assuming Proposition \4.^ Let {pi, . . . ,p£} be a family of polynomials 
that satisfies the assumptions of Proposition 14. 1[ Because of the symmetry of the statement of 
Proposition 14.11 it suffices to establish its conclusion when i = 1. 
For G N we define a sequence ao iv : N ^ C by 



ao,N{m) := E„g[i^6(Ar)] ai(m + pi(n)). 



Then 

(22) E^g[i^jv] 



E, 



'ne[l,b{N)] 



Y\_ai{m + pi{n)) 



i=l 



i=l 



I^mG[l,Af]^nG[l,fe{Af)] 



ao,N{m)Y]_ai{m + pi{n)) 



Let p G {pi, . . . ,pi} be any polynomial such that the polynomial p + pi has maximal degree 
within the family {p,p + pi, . . . ,p + pe}. Making the change of variables m — )• m + p{n), and 
using our growth assumption p{b{N)) / N — >■ 0, we see that the difference of the averages 



(23) ^ne[i,b{N)]Kn&[i,N] (oo,Af (m) ai(m + pi{n)) 

i=l 

and the averages 

i 

(24) ^ne[iMN)]^me[i,N] [ao,N{m + p{n))Y[ai{m + p{n) + Pi{n)) 

i=l 

converges to as — )• oo. Since by assumption the polynomials pi and pi —pi are non-constant 
for i = 2, . . . ,i, and by the choice of p the polynomial p + pi has maximal degree within the 
family {p,p + pi, ■ ■ ■ ,P + P£}, the assumptions of Proposition 14.21 are satisfied, where the role of 
Pi plays the polynomial p + pi. Using the Cauchy-Schwarz inequality we conclude that there 



exists A; G N, depending only on i and on deg(p + pi), such that if |||ai| 



0, then the 

averages (p^ converge to as A'' — )• cxd. As a consequence, the averages ([23]) converge to as 
N oo. The result now follows from (1221). □ 
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We are going to prove Proposition 14.21 bv repeated applications of the following consequence 
of van der Corput's fundamental estimate (see, for example, Lemma 3.1 in [18j): 

Corollary 4.3. Let A'" € N and a(l), . . . ,a{N) be complex numbers bounded by 1. Then for 
every integer R between 1 and N we have 

Ke[i,N]a{nf < 4 • - rR-^)^{E^^[^^N]a{n + r) • a{n)) + R-^ + RN'^^ . 

4.1.1. The linear case. The next lemma will be used to prove the linear case of Proposition 14.11 
Furthermore, its proof contains the main technical maneuver needed to carry out the inductive 
step in the proof of Proposition 14.11 

Lemma 4.4. Let a: N — )• C be a sequence bounded by 1. Then 

limsupE^g[i^;v]|E„e[i,b(Ar)]a(m + n)p < ljla|||c/2(N)- 

Proof. Since b{N) — )• oo as — )■ oo, by Corollary 14.31 we get that for every i? € N the limit 

limsupE„g[i^jv] |E„g[i_;,(;v)]a(m + n)f 

is bounded by 4 times the expression 

limsupE^g[i^Ar]E,.g[i^^](l - ri?"^)5R(E.„g[i_;,(^)]a(m + n + r) ■ a{m + n)) + R~'^. 

We interchange averages and make the change of variables m ^ m — n. Since h{N)/N — > 
and the sequence {a{n))n^^ is bounded, we deduce that the last expression is equal to 

limsupErg[ijj](l — ri?^"'^)5R(Emg[i^^]a(?Ti + r) • a(m)) + R^^ . 

Finally, letting i? — > oo we get that the original limit is bounded by 

limsupE,.g[i^^] limsup |E„g[i^^]a(m + r) • a(m)| < |||a|||^ .j^n. 

7V->oo 7V->oo 

Since |||a|||^2(N) ^ !> this establishes the advertised estimate. □ 

Proof of Proposition \4-.S\ for linear polynomials. For notational convenience we let oi^at := oi 
for every € N. It suffices to show that if ||aj.Ar||^ < 1 for i = 1, . . . , £ and S N, then 

(25) limsupE„g[i^;v]|E„e[i,b(Ar)]TTai,Af("i + ^if^)| <fci,..,fc^ ll|ai|||t/^+i(N)- 

N-^oo ■ , 

t=l 

We use induction on i, the number of sequences involved. 
For i = 1 the result follows from Lemma 14.41 and the estimate 

(26) limsupE„g[i^jV]|a(A"')l < A; ■ limsupE„g[l^Jv]l'3^("')l• 

A''■->■oo N^oo 

To carry out the inductive step, let i > 2, and suppose that the statement holds for i — 
1 sequences. Following the argument used in the proof of Lemma 14.41 using the induction 
hypothesis, and the estimate ([26l) . we get that the left hand side in ([25|) is bounded by a 
constant, that depends on ki, . . . ,k£, multiple of 

limsupErg[i_Ar]|||S'rai •ai|||{7^(N) < |||«i ||Ic/^^j(n) 

A''— >-oo 

where the last estimate follows from (jl4p and Holder's inequality. Since ||ai||oo < 1, we have 
llki|||c/^+i(N) ^ 1- This completes the proof. □ 
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4.1.2. The general case. We first explain an induction scheme, often called PET induction 
(Polynomial Exhaustion Technique), on types of families of polynomials that was introduced 
by Bergelson in [5j. 

We define the degree of a family V of non-constant polynomials to be the maximum of the 
degrees of the polynomials in the family. Let Vi be the subfamily of polynomials of degree i in 
v. We let Wi denote the number of distinct leading coefficients that appear in the family Vi- 
The vector {d,Wd, ■ ■ ■ ,wi) is called the type of the family of polynomials V. We order the set 
of all possible types lexicographically, meaning, (d, Wd, • • • , wi) > {d' , w'^,, . . . , w[) if and only if 
in the first instance where the two vectors disagree the coordinate of the first vector is greater 
than the coordinate of the second vector. One easily verifies that every decreasing sequence 
of types is eventually constant, thus, if some operation reduces the type, then after a finite 
number of repetitions it is going to terminate. 

Next, we define such an operation: Let V = {pi, . . . ,p£) be an ordered family of polynomials, 
p & V, and r € N. The family {p,r) -vdC{V) consists of all non-constant polynomials of the 
form Pi —p, SrPi — p, i = 1, . . . where SrP is defined by {Srp){n) := p{n + r). We order them 
so that the polynomial SrPi — p appears first. 

We call an ordered family of polynomials (pi, . . . ,Pi) nice if deg(pi) > deg(pj) and pi — pi is 
non-constant for i = 2, . . . ,i. 

Lemma 4.5. Let V = (pi, . . . ,pi) he a nice ordered family of polynomials, and suppose that 
deg{pi) > 2. Then there exists a polynomial p & V, such that for every large enough r S N, the 
family (p, r) -vdC('P) is nice and has strictly smaller type than that ofV. 

Proof. If all the polynomials have the same degree and leading coefficient, then we take p = pi. 
If all the polynomials have the same degree and at least one has different leading coefficient than 
pi, then we take any such polynomial as p. Otherwise, there exists a non-constant polynomial 
in V with degree strictly smaller than the degree of pi. We take p to be any such polynomial 
that has minimal degree. In all cases, it is easy to check the advertised property. □ 

Proof of Proposition \4-^ It suffices to show that the k given in the statement of Proposition l4.2l 
depends only on the number i and the type W of the family of polynomials involved. This is 
the case because if we fix the degree and the cardinality of a family of polynomials, then there 
are a finite number of possibilities for its type. 

We are going to use induction on the type of the family of polynomials involved. As our 
base case we take the case where all the polynomials are linear; then the result was proved in 
the previous subsection with k = £ -\- 1. 

Let now 7^ be a nice ordered family of i polynomials with deg(pi) > 2 and type W, and 
suppose that the statement holds for all nice ordered families of i' polynomials with type W 
strictly smaller than W for some k = k{W',i') € N. 

Let p € V he chosen as in Lemma 14.51 Using Corollary 14.31 making the change of variables 
m ^ m — p{n), and using that p{b{N))/N — > 0, exactly as in the proof of Lemma 14.41 gst 
that the limsup as — t- oo of the averages in (I2ip is bounded by a constant multiple of 

e 

lim sup E^(z[i^N] lim sup IE^e[i,Ar] |E„g[i ^(^)] Oi^N {m+pi{n + r) -pin)) -Ui^N {m+pi{n) -p{n)) \ , 

1=1 

where again for notational convenience we have defined a\^M '■= o^i for A G N. By Lemma 14.51 
for suitably large r G N, the family (p, r)-vdC(P) is nice, has type strictly smaller than W ., 
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and consists of at most 21 polynomials. Let 

k(W,£)= max kiW',i'), 
w'<w,e'<2e 

where the maximum is taken over all £' with i' < 21 and possible types W with W' < W 
of families consisting of at most 2i polynomials (there is a finite number of such possible 
types). Using the induction hypothesis and the Cauchy-Schwarz inequality, we get that if 
= 0' then for every large enough r G N we have 



0. 



limsupE^gri^Ari E„griWAr)]( TTai,Ar(m + pj(n + r) -p{n)) • ai,Ar(m +pi(n) - p{n)) 

«=1 

This completes the induction and the proof. □ 



4.2. Proof of the main results for polynomial averages. We are now one short step from 
proving Theorems 11.31 and 11.41 

Proof of TheoreTn \1.4\ Let A; € N be such that the conclusion of Proposition HTD holds. Without 
loss of generality we can assume that |||/i |||fc,^,Ti = 0. Then for /x almost every x € X we have 
|||/i|||fc,/xj;,Ti = 0. Using Proposition 13.11 we deduce that for fj, almost every x £ X we have 
III /i(r"rE) 11^^(1^-) = 0. The result now follows by applying Proposition 14.11 to the sequences 
ai : N ^ C defined by ai{n) := fi{T^x), i = !,...,£. □ 

Proof of Theorem I j . 31 We assume as we may that ||/i||/^oo(^) < 1 for i = Let k £ 'N 

be the integer given by Theorem II. 4[ Let e > 0. For i = 1 . . . ,i, we use Proposition 13.81 to 
get the decomposition = //^ + /"^ + /?^, where l/^glA: = 0, /f^ < e, all functions 

are bounded by 2, and for /x almost every x £ X, the sequence {f^^{T'^x))neN is a (fc — l)-step 
nilsequence. Let 



AN{fi){x) := ^m€[l,N],n€[l,b{N)]fl{Tl ' x) ■ . . . ■ fe{T^ 



X 



Theorem 12.11 implies that the averages A]sf{ff^){x) convergence pointwise. Hence, it suffices to 
show that when computing the average A]\f{fi){x) the contribution of the functions /"^ and 
/fj, becomes negligible as — t- oo and e is taken suitably small. Theorem 11.41 implies that the 
contribution of the functions /"^ is negligible, independently of the choice of e. To handle the 
contribution of the functions we argue as in the proof of the corresponding convergence 
result for the cubic averages in Section 13.21 Let us just explain the only point where our 
argument deviates slightly from the aforementioned argument. We expand AN{ff^ + f[^) and 
write Aj\i{f^^ + ff^) — A^^ff^) as a sum of 2^ — 1 averages. We deal with each such average 
separately, and bound all the functions by their sup norm except one (chosen arbitrarily) that 
is equal to f[^ for some i € {1, . . . ,£}. Upon doing this, we get the bound 

\ANifle + fte)ix) - AMifleK^)\ «i max E^e[l,7V],ne[l,fe(iV)] l/lle I (7^"^''^^"^^) 

i=l,...,f: 

That the right hand side becomes negligible as ^ oo, and e is chosen suitably small, follows 
(as in the proof given in Section [3^ upon noticing that for every system {X, X, fi,T), function 
/ € L°°{fj.), and polynomial p G {pi, . . . ,Pi}, one has for fi almsot every x £ X that 



Jim^E^e[i,;v],ne[i,fe(iV)]l/l(r™+^^")x) = J \f\dij.. 
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where ^ = J dn{x) is the ergodic decomposition of the measure ^ with respect to T. To get 
this identity it suffices to make the change of variables m — >■ m—p(n), use that p(b(N))/N 
and the ergodic theorem. This completes the proof. □ 
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